HDDS-14041. Add metrics to track Snapshot RocksDB space and SST File stats.#9406
HDDS-14041. Add metrics to track Snapshot RocksDB space and SST File stats.#9406jojochuang merged 9 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive metrics tracking for Ozone Manager's snapshot RocksDB directories, monitoring disk space usage and SST file counts. The implementation introduces a new metrics class that periodically collects statistics both at the aggregate level (total snapshots directory size and SST file count) and per-checkpoint directory level, with configurable update intervals.
Key Changes
- Introduces
OMSnapshotDirectoryMetricsclass with asynchronous metrics collection using a Timer-based scheduler - Adds configuration property
ozone.om.snapshot.directory.metrics.update.interval(default: 5 minutes) to control update frequency - Integrates metrics lifecycle into OzoneManager's start/restart/stop methods
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
OMSnapshotDirectoryMetrics.java |
New metrics class implementing periodic collection of snapshot directory size and SST file statistics with both aggregate and per-checkpoint metrics |
OzoneManager.java |
Integrates snapshot directory metrics into OM lifecycle by starting metrics collection on start/restart and stopping on shutdown |
OMMetrics.java |
Adds snapshot directory metrics management methods and field to hold the metrics instance |
OMConfigKeys.java |
Defines configuration key and default value for metrics update interval |
ozone-default.xml |
Adds configuration property documentation for the metrics update interval setting |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@sadanand48 also I updated the jira release note. PTAL |
swamirishi
left a comment
There was a problem hiding this comment.
Thanks for the patch @sadanand48 have some nitpicky review comments on the code modularity and design of this. Let us make this change to beautify the code a bit so that it would be easier to add more metrics later on.
| */ | ||
| @InterfaceAudience.Private | ||
| @Metrics(about = "OM Snapshot Directory Metrics", context = OzoneConsts.OZONE) | ||
| public final class OMSnapshotDirectoryMetrics implements MetricsSource { |
There was a problem hiding this comment.
Instead of having inner static classes. I would suggest to create snapshot.metrics package have these package private classes. It would make the code more readable.
swamirishi
left a comment
There was a problem hiding this comment.
Have a few doubts left comments inline
| */ | ||
| private long calculateDirSizeAccountingForHardlinks(File directory) | ||
| throws IOException { | ||
| Set<Object> visitedInodes = new HashSet<>(); |
There was a problem hiding this comment.
Shouldn't this be a globalSet ? We can make this class variable and clear at the end of each run to reuse
There was a problem hiding this comment.
the directory here is the db.snapshots/checkpointState dir that has all checkpoint dirs so we should be good
|
|
||
| // Add per-checkpoint-directory metrics from cached map | ||
| Map<String, CheckpointMetrics> currentMetrics = checkpointMetricsMap; | ||
| for (Map.Entry<String, CheckpointMetrics> entry : currentMetrics.entrySet()) { |
There was a problem hiding this comment.
currentMetrics is a HashMap order would be indeterministic. We should use a treeMap to ensure this
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it. |
What changes were proposed in this pull request?
Add metrics to track Snapshot RocksDB space and SST File stats
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14041
How was this patch tested?
Tried out on docker